AITopics | relative entropy

Collaborating Authors

relative entropy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Information Theory and Statistical Learning

Gamal, Abbas El

arXiv.org Machine LearningMay-6-2026

This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), $f$\!-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.

artificial intelligence, fdata, machine learning, (17 more...)

arXiv.org Machine Learning

2605.02989

Country:

Europe (0.46)
North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Compression with Bayesian Implicit Neural Representations

Neural Information Processing SystemsApr-24-2026, 09:15:44 GMT

Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the β-ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting β. Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.

artificial intelligence, compression, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Tighter Expected Generalization Error Bounds via Wasserstein Distance

Neural Information Processing SystemsFeb-10-2026, 07:30:55 GMT

This work presents several expected generalization error bounds based on the Wasserstein distance.

artificial intelligence, generalization error, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

ABayesianNonparametricsViewinto DeepRepresentations

Neural Information Processing SystemsFeb-7-2026, 12:07:28 GMT

We investigate neural network representations from a probabilistic perspective.

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > Canada > Quebec > Montreal (0.05)
Europe > Poland > Lesser Poland Province > Kraków (0.05)
(8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

060b2af0081a460f7f466f7f174d9052-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 11:13:12 GMT

combiner, compression, posterior, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > China (0.04)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)

Add feedback

Empirical Risk Minimization with $f$-Divergence Regularization

Daunas, Francisco, Esnaola, Iñaki, Perlaza, Samir M., Poor, H. Vincent

arXiv.org Machine LearningJan-21-2026

In this paper, the solution to the empirical risk minimization problem with $f$-divergence regularization (ERM-$f$DR) is presented and conditions under which the solution also serves as the solution to the minimization of the expected empirical risk subject to an $f$-divergence constraint are established. The proposed approach extends applicability to a broader class of $f$-divergences than previously reported and yields theoretical results that recover previously known results. Additionally, the difference between the expected empirical risk of the ERM-$f$DR solution and that of its reference measure is characterized, providing insights into previously studied cases of $f$-divergences. A central contribution is the introduction of the normalization function, a mathematical object that is critical in both the dual formulation and practical computation of the ERM-$f$DR solution. This work presents an implicit characterization of the normalization function as a nonlinear ordinary differential equation (ODE), establishes its key properties, and subsequently leverages them to construct a numerical algorithm for approximating the normalization factor under mild assumptions. Further analysis demonstrates structural equivalences between ERM-$f$DR problems with different $f$-divergences via transformations of the empirical risk. Finally, the proposed algorithm is used to compute the training and test risks of ERM-$f$DR solutions under different $f$-divergence regularizers. This numerical example highlights the practical implications of choosing different functions $f$ in ERM-$f$DR problems.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2601.13191

Country:

Asia (0.92)
Europe > United Kingdom > England (0.28)
North America > United States > California > Los Angeles County (0.27)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Data Science (0.92)
(2 more...)

Add feedback

Optimal Anytime-Valid Tests for Composite Nulls

Shekhar, Shubhanshu

arXiv.org Machine LearningDec-24-2025

We consider the problem of designing optimal level-$α$ power-one tests for composite nulls. Given a parameter $α\in (0,1)$ and a stream of $\mathcal{X}$-valued observations $\{X_n: n \geq 1\} \overset{i.i.d.}{\sim} P$, the goal is to design a level-$α$ power-one test $τ_α$ for the null $H_0: P \in \mathcal{P}_0 \subset \mathcal{P}(\mathcal{X})$. Prior works have shown that any such $τ_α$ must satisfy $\mathbb{E}_P[τ_α] \geq \tfrac{\log(1/α)}{γ^*(P, \mathcal{P}_0)}$, where $γ^*(P, \mathcal{P}_0)$ is the so-called $\mathrm{KL}_{\inf}$ or minimum divergence of $P$ to the null class. In this paper, our objective is to develop and analyze constructive schemes that match this lower bound as $α\downarrow 0$. We first consider the finite-alphabet case~($|\mathcal{X}| = m < \infty$), and show that a test based on \emph{universal} $e$-process~(formed by the ratio of a universal predictor and the running null MLE) is optimal in the above sense. The proof relies on a Donsker-Varadhan~(DV) based saddle-point representation of $\mathrm{KL}_{\inf}$, and an application of Sion's minimax theorem. This characterization motivates a general method for arbitrary $\mathcal{X}$: construct an $e$-process based on the empirical solutions to the saddle-point representation over a sufficiently rich class of test functions. We give sufficient conditions for the optimality of this test for compact convex nulls, and verify them for Hölder smooth density models. We end the paper with a discussion on the computational aspects of implementing our proposed tests in some practical settings.

assumption 1, denote, relative entropy, (13 more...)

arXiv.org Machine Learning

2512.20039

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Performance Guarantees for Quantum Neural Estimation of Entropies

Sreekumar, Sreejith, Goldfeld, Ziv, Wilde, Mark M.

arXiv.org Artificial IntelligenceNov-25-2025

Estimating quantum entropies and divergences is an important problem in quantum physics, information theory, and machine learning. Quantum neural estimators (QNEs), which utilize a hybrid classical-quantum architecture, have recently emerged as an appealing computational framework for estimating these measures. Such estimators combine classical neural networks with parametrized quantum circuits, and their deployment typically entails tedious tuning of hyperparameters controlling the sample size, network architecture, and circuit topology. This work initiates the study of formal guarantees for QNEs of measured (Rényi) relative entropies in the form of non-asymptotic error risk bounds. We further establish exponential tail bounds showing that the error is sub-Gaussian, and thus sharply concentrates about the ground truth value. For an appropriate sub-class of density operator pairs on a space of dimension $d$ with bounded Thompson metric, our theory establishes a copy complexity of $O(|Θ(\mathcal{U})|d/ε^2)$ for QNE with a quantum circuit parameter set $Θ(\mathcal{U})$, which has minimax optimal dependence on the accuracy $ε$. Additionally, if the density operator pairs are permutation invariant, we improve the dimension dependence above to $O(|Θ(\mathcal{U})|\mathrm{polylog}(d)/ε^2)$. Our theory aims to facilitate principled implementation of QNEs for measured relative entropies and guide hyperparameter tuning in practice.

artificial intelligence, entropy, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.19289

Country: North America > United States (0.67)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Quantum Information Ordering and Differential Privacy

Dasgupta, Ayanava, Warsi, Naqueeb Ahmad, Hayashi, Masahito

arXiv.org Artificial IntelligenceNov-14-2025

We study quantum differential privacy (QDP) by defining a notion of the order of informativeness between two pairs of quantum states. In particular, we show that if the hypothesis testing divergence of the one pair dominates over that of the other pair, then this dominance holds for every f -divergence. This approach completely characterizes (ε,δ)-QDP mechanisms by identifying the most informative (ε,δ)-DP quantum state pairs. We apply this to analyze the stability of quantum differentially private learning algorithms, generalizing classical results to the case δ > 0. Additionally, we study precise limits for privatized hypothesis testing and privatized quantum parameter estimation, including tight upper-bounds on the quantum Fisher information under QDP . Finally, we establish near-optimal contraction bounds for differentially private quantum channels with respect to the hockey-stick divergence. I. Introduction A fundamental challenge in modern machine learning is the trade-off between privacy and information extraction. In this work, we explicitly treat both sides: privacy (ensuring that algorithmic outputs do not reveal significant information about the input data of the respondents) and the investigator's goal to extract as much useful information as possible from data for accurate learning and estimation. With the rapid advancement of machine learning, a key concern is about ensuring the privacy of learning algorithms, meaning that their outputs should not reveal significant information about the input data. Differential privacy (DP) provides a rigorous mathematical framework to balance these opposing requirements. Accordingly, we structure our contributions in three steps: first step (privacy), second step (information extraction under privacy constraints), and third step, the quantum channel setup, where the situation is more complicated, and we mark the transition to each step explicitly in the text. This step develops the privacy side of the trade-off from the respondent's perspective by studying the stability [1], [2] of learning algorithms. From the respondent's viewpoint, privacy means that the inclusion or exclusion of their individual data should not materially affect the mechanism's output, so that they can contribute data without fear of singled-out inference. An algorithm is considered stable if its output does not change drastically when a single respondent's data is changed; this point-wise insensitivity is precisely the respondent-centric guarantee we seek.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.01467

Country: